# Data Sources for Replication (140-Country Expansion)

All raw data files must be downloaded before running the pipeline. Due to licensing restrictions, raw data files are not redistributed with this replication package.

The followup paper uses the same 10 data sources as the original paper, processed with expanded country coverage (140 countries vs. 69).

## Download Instructions

### 1. UN World Population Prospects 2024

- **URL**: `https://population.un.org/wpp/assets/Excel%20Files/1_Indicator%20(Standard)/CSV_FILES/WPP2024_Population1JanuaryByAge5GroupSex_Medium.csv.gz`
- **Output file**: `data/raw/un_wpp_population_by_age.csv`
- **Size**: ~70 MB compressed, ~1.77M rows uncompressed
- **Note**: URL changed in 2024 from the old `Download/Files/` path. The pipeline handles this automatically.

### 2. IMF World Economic Outlook (April 2025)

- **Access**: Via the Python `weo` package: `import weo; weo.download(2025, 1)`
- **Output file**: `data/raw/weo_data.csv`
- **Variables**: BCA_NGDPD (CA/GDP), GGXCNL_NGDP (fiscal), NGDP_RPCH (growth), NGAP_NPGDP (output gap)
- **Note**: `weo.getc()` returns wide format with PeriodIndex rows and ISO3 columns. Must melt across columns.

### 3. Penn World Tables 10.0

- **URL**: `https://www.rug.nl/ggdc/docs/pwt100.xlsx`
- **Output file**: `data/raw/pwt1001.xlsx`
- **Size**: ~25 MB
- **Note**: The primary host `dataverse.nl` frequently times out. The `rug.nl` fallback URL works reliably for PWT 10.0.

### 4. World Bank World Development Indicators

- **Access**: Via the Python `wbgapi` package
- **Indicators**: `SH.XPD.GHED.GD.ZS` (health expenditure), `SP.DYN.LE00.IN` (life expectancy), `NE.TRD.GNFS.ZS` (trade openness)
- **Output file**: `data/raw/wdi_data.csv`

### 5. Chinn-Ito KAOPEN Index

- **URL**: `https://web.pdx.edu/~ito/kaopen_2023.xls`
- **Output file**: `data/raw/kaopen.xls`
- **Note**: File is `.xls` format (not `.xlsx`), requires the `xlrd` package. The `ccode` column contains ISO3 country codes.

### 6. Lane & Milesi-Ferretti External Wealth of Nations

- **URL**: `https://www.brookings.edu/wp-content/uploads/2026/02/EWN-dataset-year-end-2024_feb06.xlsx`
- **Output file**: `data/raw/ewn.xlsx`
- **Note**: Data is in the `Dataset` sheet (the default sheet is an introduction page). Uses IFS numeric country codes, not ISO3. The code includes a manual mapping table for ~150 IFS-to-ISO3 conversions.

### 7. IMF Monetary and Financial Statistics (Interest Rates)

- **Access**: Via the Python `imfp` package with `database_id='MFS_IR'`
- **Output file**: `data/raw/imf_ifs_rates.csv`
- **Note**: The old `IFS` database ID no longer exists; rates are now in `MFS_IR`. Columns are `country`, `time_period`, `obs_value`.

### 8. FRED (Federal Reserve Economic Data)

- **Access**: Via the Python `fredapi` package (requires API key from https://fred.stlouisfed.org/docs/api/api_key.html)
- **Series**: `IRLTLT01[CC]M156N` (10-year govt bond yields), `IR3TIB01[CC]M156N` (3-month rates) for 23 OECD countries
- **Output file**: `data/raw/fred_rates.csv`
- **Note**: Run with `--fred-key YOUR_API_KEY` flag

### 9. OECD Social Expenditure (SOCX) --- Pension Spending

- **Access**: OECD SDMX REST API
- **URL**: `https://sdmx.oecd.org/public/rest/data/OECD.ELS.SPD,DSD_SOCX_AGG@DF_SOCX_AGG,/.A..PT_B1GQ.ES10._T._T.?startPeriod=1980&dimensionAtObservation=AllDimensions&format=csvfilewithlabels`
- **Output file**: `data/raw/oecd_pensions.csv`
- **Note**: Requires `User-Agent` header (the API rejects requests without one). Falls back to World Bank ASPIRE social insurance coverage for broader country coverage.

### 10. World Bank Savings and Investment Indicators

- **Access**: Via the Python `wbgapi` package
- **Indicators**: `NY.GNS.ICTR.ZS` (gross savings/GDP), `NY.ADJ.SVNG.GN.ZS` (gross national savings/GDP), `NE.GDI.TOTL.ZS` (gross investment/GDP), `NE.GDI.FTOT.ZS` (gross fixed investment/GDP)
- **Output file**: `data/raw/wdi_savings_investment.csv`

## Verification

After downloading all data, the following files should be present in `data/raw/`:

| File | Expected Size (approx.) |
|------|------------------------|
| `un_wpp_population_by_age.csv` | 350-400 MB |
| `weo_data.csv` | 10-15 MB |
| `pwt1001.xlsx` | 25 MB |
| `wdi_data.csv` | 2-5 MB |
| `kaopen.xls` | 0.5 MB |
| `ewn.xlsx` | 5-10 MB |
| `imf_ifs_rates.csv` | 1-2 MB |
| `fred_rates.csv` | 0.1 MB |
| `oecd_pensions.csv` | 0.1 MB |
| `wdi_savings_investment.csv` | 0.5 MB |

## Differences from Original Paper's Data

The followup paper uses the same raw data sources as the original. The key difference is in sample construction:

- **Original**: 69 countries (EBA-49 + 20 SSA), ~1,857 baseline observations
- **Followup**: 140 countries (all with sufficient data coverage), ~2,730 baseline observations
- **Expanded coverage**: Adds transition economies, small island states, additional developing countries

The `src/macro.py` module in the followup pipeline uses a relaxed country filter (`filter_eba_sample()`) that retains countries meeting minimum data coverage thresholds rather than restricting to the EBA+SSA list.

## Data Access Date

All data were downloaded on **February 9, 2026**. Subsequent updates to source databases may produce slightly different results.
